Friendlier workflow errors (consolidated)#1849
Conversation
Phase 1: Add Ansi rendering helpers (frame, hint, note, help, code, inline) to @workflow/errors, and a chalk mock for readable snapshot tests. Phase 2: Add four context-violation error classes to @workflow/core (NotInWorkflowContextError, NotInStepContextError, NotInWorkflowOrStepContextError, UnavailableInWorkflowContextError) and apply them to all twelve user-facing throw sites so errors now include docs links and a structured "what/why/fix" frame. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Tighten phase 1 changeset to a single sentence (per pranaygp review) and switch to double-quoted frontmatter (per Copilot + repo convention). - Implement `ansifyName` to actually apply dim styling to workflow/ / step/ prefixes; add an `Ansi.dim` helper to `@workflow/errors` so callers don't need to import chalk directly. - Remove the `void getWorkflowMetadata;` workaround in context-errors.ts by dropping the unused value import (we only needed the type and symbol). - Render the plain-Error throw in `workflow/get-workflow-metadata.ts` with `Ansi.frame` + docs link so the VM path matches the structured-class styling from the sibling step path (still uses a plain Error to avoid the module-init cycle). - Guard `buildUnderline` against zero-length markers so a stray empty token can't produce a negative `String.repeat` count.
Adds a `.child()` and `.forRun(runId, workflowName)` child-logger API to the structured logger so runtime/step code doesn't have to repeat `workflowRunId`/`workflowName`/`stepId` on every call. Normalizes error metadata to structured `errorName` / `errorMessage` / `errorStack` fields instead of ad-hoc `error: err.message` strings, and adds comments to silent catches that swallow expected idempotency conflicts. Also folds in the pending changes from #1812 so that PR can be closed: - Standardize the console prefix to `[workflow-sdk]`. - Split the replay-timeout log into a warn-while-retrying vs. error-when-giving-up, and surface the underlying error when we can't mark a timed-out run as failed. - Include the error stack in the "Fatal runtime error during workflow setup" log and in the top-level user-code workflow error log so the stack surfaces in flattened log drains. - Drop the `[Workflows] "<runId>" - ` prefix from `buildWorkflowSuspensionMessage` — the structured logger now attaches run context. Supersedes #1812. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 4 of friendlier errors: introduce a `SerializationError` class with
an optional `hint` and a docs link (workflow-sdk.dev/err/serialization-failed),
and adopt it at every user-facing serialization boundary in @workflow/core:
- Locked ReadableStream at a workflow boundary
- Unregistered class / missing `classId` / missing `WORKFLOW_DESERIALIZE`
- Attempting to return step functions to clients or call workflow functions
directly
- Webhook `respondWith()` called outside a step
- `dehydrate*` / `getSerializeStream` failures (workflow args/return, step
args/return, stream chunks)
Internal invariants (format prefix length checks, unknown format bytes,
missing `STREAM_NAME_SYMBOL`, encryption key/size guards, etc.) now throw
`WorkflowRuntimeError` instead of plain `Error` so the classifier and logger
treat them consistently.
`formatSerializationError` now returns `{ message, hint }` so the hint
fragment can be rendered with the standard SerializationError framing
instead of being baked into the message string.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add describeError() that derives attribution and class-aware hints from existing error classes + RUN_ERROR_CODES — no event data changes. Wire into step failures, max-delivery exhaustion, run failures, and fatal setup errors so terminal logs include errorAttribution and a hint for known error types. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- `describeError(err, errorCode?)` now accepts an optional precomputed `RunErrorCode`. `classifyRunError(err)` only narrows to USER_ERROR / RUNTIME_ERROR, so the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED branches were previously unreachable from the step / run failure log sites. Callers that know the failure category (runtime.ts for replay timeout and max-deliveries exhaustion) now pass the code in. - Context-violation checks use `instanceof` against the actual classes from context-errors.ts instead of a name-string set. Type-safe + survives class renames. - Wire the new hints through to the REPLAY_TIMEOUT and MAX_DELIVERIES_EXCEEDED log sites so those branches actually render a hint now. - 3 new tests cover the reachable code paths + precomputed-code override. - Changeset frontmatter switched to double quotes per repo convention.
Internal invariants now use WorkflowRuntimeError so describeError attributes them to the SDK: missing startedAt, VM generateKey, closure-vars outside step context, ENOTSUP. defineHook().resume() formats schema validation failures as a readable list instead of a JSON blob. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Observability renderers read persisted run_failed / step_failed event data,
not live Error instances. describeRunError takes { errorCode, errorName }
and returns the same { attribution, hint } shape as describeError, so the
CLI and web UI can derive user-vs-SDK framing from the event log directly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `WorkflowBuildError` class in `@workflow/errors` with optional `hint` for an actionable next step, and apply it in `@workflow/builders` at user-facing sites: failed esbuild phases, unresolved built-in steps, and empty esbuild output now throw `WorkflowBuildError` with a hint pointing at the likely fix. Runtime invariants remain plain `Error`.
…docs link, redirect stack
- Drop the readonly `functionName` param-property on context-error classes so
util.inspect no longer prints a trailing `{ functionName: 'foo()' }` block.
- Replace the `DocLink` ("label: https://…") shape with a plain `DocsUrl`
template-literal type. Error output now renders a single clean line:
`docs: https://…` (new `Ansi.docs` helper) instead of the noisier
"note: Read more about foo(): https://…".
- Add throw helpers (`throwNotInWorkflowContext`, etc.) that call
`Error.captureStackTrace(err, stackStartFn)` on V8 engines so the top frame
of the thrown error points at the user's call site instead of at the gate
function inside the framework. Callers pass themselves as the boundary.
- Refactor `defineHook()` (both root and `/workflow`) to use named function
closures rather than `this.create`/`this.resume`, since the stack redirect
relies on a stable function identity that survives destructuring.
- Update context-errors.test.ts to snapshot the new `docs:` framing and to
add a regression test asserting the top stack frame is the user call site.
🦋 Changeset detectedLatest commit: 5b8d791 The changes in this PR will be included in the next version bump. This PR includes changesets to release 23 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results✅ All tests passed Summary
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
✅ 📋 Other
|
Replace util.inspect's default object dump (which quote-escapes multi-line stacks and paragraph hints into a single-line JSON-y blob) with a workflow-aware formatter that composes the entire log line into a single string passed to console.error / console.warn. Highlights of the new output: - Per-run / per-step IDs render with their parsed friendly names so users see `wrun_… · simple (./workflows/1_simple)` instead of just the raw `workflowName: 'workflow//./workflows/1_simple//simple'`. - Color-coded attribution badge (user error red / sdk error magenta) paired with the error class in bold. - Hints render as a paragraph under `hint:` rather than a backslash- `\n`-escaped string. - Drops redundant fields (errorStack always; errorMessage when it's already in the parent message) to avoid double-printing. - Unknown fields fall through as a sorted `key value` tail so we never silently drop log information. @workflow/errors/ansi gains bold/red/magenta helpers used by the formatter. The web / web-shared packages don't consume stderr — they read structured event payloads from the World event log — so this is presentation-only at the runtime layer.
Resolve conflicts:
- packages/builders/src/base-builder.ts — keep WorkflowBuildError import
alongside main's removal of unused usesVercelWorld.
- packages/core/src/serialization.ts — main extracted serialization into
modular ./serialization/{client,step,workflow}.ts and reformatted the
error wrapper. Re-apply the SerializationError adoption on top of the
new structure:
- ./serialization/errors.ts now returns { message, hint } so callers
can pass the hint through SerializationError instead of baking it
into the message.
- The 3 modular serializers (client/step/workflow) and 4 dehydrate/
hydrate wrappers in serialization.ts now throw SerializationError
with a hint, unwrapping any inner SerializationError cause to keep
the more specific outer context label ("workflow arguments" vs
generic "workflow value").
- User-facing throws (ReadableStream is locked / step function not
found / workflow functions cannot be called directly / respondWith
outside step / step functions cannot be deserialized in client
context) are now SerializationError with hints.
- Internal invariants (stream-name validation, missing STREAM_NAME_
SYMBOL, closure-vars outside step context) are now WorkflowRuntime-
Error so they get classified consistently.
TooTallNate
left a comment
There was a problem hiding this comment.
Read through the consolidated stack carefully — context-violation classes, capture-stack helper, describe-error, the errors-package additions and ANSI subpath, the logger rewrite, and the step-handler dehydrate move. The overall direction is good and a lot of it is well-executed; the structured FramedContent / renderPlain / renderPretty split is the right architecture for "plain message + pretty terminal rendering" and the subpath split for chalk is correct. Finding most of the right tradeoffs.
I have one blocking item, plus a handful of follow-up concerns and observations to raise before merge.
Blocking
pr-artifacts/ directory needs to be removed before merge
Six files still in pr-artifacts/ (01-... through 05-... plus README). PR description explicitly says "Remove before merge" — they shouldn't ship to git history. Easy git rm pr-artifacts/, but it's the kind of thing that gets forgotten in the merge rush.
Higher-risk concerns
FatalError.is() widening is a runtime-behavior change
The static is() predicate now returns true for any Error carrying a truthy fatal property — not just name === 'FatalError'. This is the documented intent and enables SerializationError, ContextViolationError, etc. to opt in without inheritance. But it changes retry semantics — anything FatalError.is() returns true for skips retry budget.
Risk: a user-defined error class that incidentally has fatal = true (someone's class CustomError { fatal = true }) will silently become non-retryable in workflow runs. For most users this is fine; for anyone who rolled their own "fatal" convention, this is a surprise.
Two requests:
- The
friendlier-errors-consistencychangeset should beminor, notpatch, for@workflow/errors. Adding new error classes is minor; widening a publicis()predicate is at minimum minor. Verify. - Add a test that asserts a non-
ErrorPOJO with{ fatal: true }does NOT passFatalError.is()(theisErrorguard handles this, but lock it in).
Step-handler dehydrate move is correct but undertested
The "move dehydrate inside the user-code try/catch" change is the highest-leverage runtime-behavior change in this PR (along with the FatalError.is widening above). I traced it carefully and the new flow is correct: dehydrate failures now route through userCodeError → fatal branch → step_failed → no retry. The if (!userCodeFailed) guard is right; the new try/catch funnels into the same handler block. Confirmed end-to-end:
dehydrateStepReturnValueonly ever throwsSerializationError(own wrapping catch inserialization.ts)SerializationErrorhasreadonly fatal = trueFatalError.is(SerializationError)is true via the new widening- Routes to fatal branch →
step_failedevent → nostep_retrying
But step-handler.test.ts doesn't test this path at all. The test diff only updates mock shapes for the new forRun/child logger methods. The dehydrate mock at lines 118-127 is mockResolvedValue(...) — it never rejects. Add a test where it rejects with SerializationError and assert step_failed is emitted exactly once and step_retrying is never emitted. This is the regression-gate test the PR most needs and doesn't have.
There's also one subtle behavior change worth calling out: errors from trace() setup itself (OTEL config, etc.) used to escape to the queue (48 redeliveries). They now get caught by the new inner try/catch and treated as user-step failures (4 attempts, attributed to user). Probably fine — OTEL setup failures are deterministic — but worth a sentence in the changeset.
Other observations (non-blocking)
Minor design
-
hintis baked into.messageinWorkflowBuildErrorandSerializationError(super(\${message}\n\n${options.hint}`)). Future renderers that want to format hints separately can't recover the bare message. Consider keepingsuper(message)plain and exposinghint` separately; let renderers compose. Not blocking, but locks in a format. -
Box-drawing implemented twice:
renderPlainincontext-violation-error.tsre-implements the framing structure ofAnsi.frameto avoid pulling chalk into core. Defensible for the dependency reason (and the comment is clear), but the two implementations can drift. Worth a cross-reference comment in both places. -
Eight new throw helpers (
throwNotInWorkflowContextetc.) for four classes. Could be one genericthrowContextViolation(Class, fnName, docsUrl, stackStartFn). Current shape is more autocomplete-friendly; defensible either way.
Logger / log-format
- ANSI bytes can leak into stderr drains.
formatLogMetadatausesAnsi.bold/Ansi.redetc. which call chalk directly. Tests pass because vitest has no TTY (chalk level 0 → pass-through). But if anyone runspnpm devwithFORCE_COLOR=1, or stderr is piped through a wrapper that is a TTY upstream of a structured drain, ANSI bytes land in themessagefield. The OTel span event path (addEvent({ message, ...merged })inlogger.ts:73) is clean — it uses the unformatted message. But the console path is environment-dependent. Suggest using achalk.Instance({ level: 0 })for log formatting, or stripping ANSI before stderr write. Not blocking, but worth knowing. stepLogger.*callsites bypassforRun-based scoping.step-handler.tshas 6 callsites (lines 371, 591, 644, 711, 764, 778) that use the namespace-stepstepLoggerand manually re-pass{ workflowRunId, workflowName, stepId, stepName }each time. TheruntimeLogger.forRun(...)migration is great where it's done; these bypass it. Could bestepLogger.forRun(...).child({ stepId, stepName }).- Lowercase
[workflow]prefix still exists inruntime/world.ts:90,next/src/loader.ts:648,next/src/builder-deferred.ts:153/174/1010. These are directconsole.*calls (not logger), so technically out of scope for the prefix-standardization changeset, but worth noting if "all SDK output prefixed with[workflow-sdk]" is the goal.
Module-realm hazards
describe-error.tsmixesClass.is(err)(duck-typed; survives ESM/CJS dual-package hazards) witherr instanceof WorkflowRuntimeError(nominal; doesn't survive). Recommend.is()everywhere for consistency in a multi-realm ecosystem.
Minor
inlineinerrors/src/ansi.ts(~100 lines for the tagged template that builds Rust-compiler-style underlines) is impressively complex. Worth checking it has snapshot tests for: no markers, single marker, multiple markers, marker at end of line, color-on vs color-off, multi-line input. Skimmed the test file; coverage is thinner than the implementation deserves.- The "PR description" mentions
esbuild.context().rebuild()failures bypass the newWorkflowBuildErrorpath. The most common build-error case (unresolved imports during HMR) won't get the friendly framing. Worth a follow-up issue.
Doc
- Some doc gaps for users on whether error message strings are stable API. The new
describeErrorfunction consumeserr.nameand class identity, which suggests messages aren't API. Worth adding a CONTRIBUTING note.
To summarize what I'd want before merge:
- Remove
pr-artifacts/. - Add the regression-gate test for
dehydrateStepReturnValuerejecting instep-handler.test.ts. - Verify changeset bumps for the
FatalError.iswidening are correct (minor at least). - Add the
isErrorguard test locking in that{ fatal: true }POJOs are not classified as fatal.
Everything else is non-blocking and could be follow-ups.
VaguelySerious
left a comment
There was a problem hiding this comment.
AI review: blocking issues found
| return tail | ||
| ? `${this.name}: ${pretty}\n${tail}` | ||
| : `${this.name}: ${pretty}`; | ||
| } |
There was a problem hiding this comment.
AI Review: Blocking
[util.inspect.custom] duplicates every detail line in the rendered output. Repro on the built dist:
import { inspect } from 'node:util';
import { NotInWorkflowContextError } from '@workflow/core/...';
console.log(inspect(new NotInWorkflowContextError('createHook()', 'https://example.com/docs')));produces:
NotInWorkflowContextError: `createHook()` can only be called inside a workflow function
╰▶ docs: https://example.com/docs
╰▶ docs: https://example.com/docs ← duplicated
at userCallSite (...)
For UnavailableInWorkflowContextError (3 detail lines, with active workflow context), all three render twice.
Root cause: .message is multi-line (title\n╰▶ docs: …), so V8's .stack is Name: messageLine1\nmessageLine2\nmessageLine3\n at …. split('\n').slice(1).join('\n') only strips line 1, gluing the remaining message lines onto the prepended pretty form.
Suggested fix:
const messageLines = this.message.split('\n').length;
const tail = (this.stack ?? '').split('\n').slice(messageLines).join('\n');This is the primary terminal display path users see (console.error(err), framework dev overlays, uncaught throws), so worth fixing before merge. The existing 'util.inspect(err) reveals the pretty framed form' test only asserts .toContain('╰▶') and won't catch this — recommend adding a not.toMatch(/╰▶ docs:.*╰▶ docs:/s) or a toMatchInlineSnapshot with masked stack frames.
There was a problem hiding this comment.
Fixed in 9d45cdf — counted the actual message line count instead of slicing only line 1. Added a regression test in context-errors.test.ts that asserts ╰▶ docs: appears exactly once and that no detail line repeats.
|
|
||
| constructor(message: string, options?: WorkflowBuildErrorOptions) { | ||
| const body = options?.hint ? `${message}\n\n${options.hint}` : message; | ||
| super(body, { cause: options?.cause }); |
There was a problem hiding this comment.
AI Review: Note
This super(body, { cause: options?.cause }) call (and the equivalent in SerializationError below) hits WorkflowError's constructor, which unconditionally does this.cause = options?.cause. That makes cause: undefined an enumerable own property on every instance, so util.inspect(err) of a no-cause error renders { cause: undefined, ... }.
Not new to this PR — WorkflowError already has the issue — but the new error classes added here inherit it, so the leak surfaces on more classes post-merge.
If you'd rather not address it in this PR, fine — but a one-liner if (options?.cause !== undefined) this.cause = options.cause; in WorkflowError's constructor would clean up the inspect output for all subclasses.
There was a problem hiding this comment.
Fixed in 9d45cdf — wrapped the assignment in if (options?.cause !== undefined). super(message, { cause }) already sets a non-enumerable .cause when provided, so the only thing my code needed to drop was the unconditional re-assignment that turned it into an enumerable own property. Cleans up util.inspect(err) output for every WorkflowError subclass. Changeset entry added at .changeset/workflow-error-cause-undefined.md.
| 'wrun_test123', | ||
| expect.any(String), | ||
| expect.objectContaining({ stepId: 'step_abc' }) | ||
| ); |
There was a problem hiding this comment.
AI Review: Note
The most user-visible behavior change in this PR — "fatal errors fail in 1 attempt, not 4" — has unit coverage on FatalError.is(ContextViolationError) === true and SerializationError.fatal === true, but no end-to-end test on the step handler itself asserting that a fatal user error emits exactly one step_failed event (no retries) and a non-fatal Error does retry to maxRetries.
If someone removes fatal = true from ContextViolationError (or moves the dehydrate call back outside the user-code try/catch) later, the unit tests stay green and the regression returns silently. A small integration-style test exercising the retry loop with a fatal vs. non-fatal user error would be a cheap regression gate.
There was a problem hiding this comment.
Done in 9d45cdf — added a step-handler fatal vs retryable behavior describe block in step-handler.test.ts that exercises the live retry-loop wiring. Three cases:
- error with
fatal: true→ exactly onestep_failed, zerostep_retrying - non-fatal first-attempt
Error→step_retryingonce, nostep_failed - non-fatal final attempt →
step_failedonce
Catches the case where fatal = true is removed from a context-violation class — the FatalError.is() unit tests would stay green but this test would flip from "1 step_failed" to "4 step_retrying then 1 step_failed."
| message.includes(metadata.errorMessage as string) | ||
| ) { | ||
| redundant.add('errorMessage'); | ||
| } |
There was a problem hiding this comment.
AI Review: Nit
message.includes(metadata.errorMessage) is brittle: it works for the current call sites that pass ${framing}\n${stack} (which embeds the full message), but a future caller that passes a truncated errorMessage field and a full message would fail to dedupe and double-print. Not actionable today, just flagging if snapshot tests start drifting later.
There was a problem hiding this comment.
Acknowledged. Agree it's a heuristic that depends on current call sites. Leaving as-is for now since all callers compose the message via ${framing}\n${stack} (and the stack always begins with the error name + message), but if a future caller passes a truncated errorMessage we'd need to switch to a more explicit "already-rendered fields" set.
| }, | ||
| "dependencies": { | ||
| "@workflow/utils": "workspace:*", | ||
| "chalk": "5.6.2", |
There was a problem hiding this comment.
AI Review: Nit
The PR description says the change "no longer pulls chalk into every consumer" — accurate at the bundle level (the main entry doesn't import chalk, so tree-shaken consumers don't pay), but chalk is still in dependencies so it lands in node_modules for every install. Minor framing thing — if you wanted to avoid the install footprint for consumers that never touch ./ansi, this would have to move to optionalDependencies or peerDependencies. Probably not worth it; just calling out the gap between the description and the install reality.
There was a problem hiding this comment.
Good catch on the framing. Updating to "no longer pulls chalk into the bundle for consumers that don't import the ./ansi subpath." Moving chalk out of dependencies to peerDependencies / optionalDependencies is more disruptive (anyone importing ./ansi would have to install chalk explicitly), so leaving it as-is — the install footprint trade-off is fine for the SDK's audience.
The job never runs `pnpm install` (it just calls `node` against a checked-in script), so the pnpm store path never exists. The post-job `actions/setup-node@v4` cache-save then fails with `Path Validation Error: Path(s) specified in the action for caching do(es) not exist` and red-X's the entire job even though the matrix step succeeded. The setup-workflow-dev composite already has a `cache-pnpm` opt-out input for this exact case — wire it through here.
- ContextViolationError: util.inspect(err) duplicated every framed detail
line because the stack-tail strip only sliced the first message line.
V8's Error.stack reads `Name: messageLine1\n messageLine2\n at ...`,
so for our multi-line `title\n╰▶ docs: …` messages every detail line
was getting prepended twice (once in the pretty form, once via the
unsliced message tail). Count the actual message lines and slice past
all of them. Repro test asserts `╰▶ docs:` appears exactly once.
- WorkflowError: stop assigning `cause: undefined` as an enumerable own
property when no cause is provided. Subclasses (every error in this PR)
inherit the parent constructor; the unconditional assignment polluted
`util.inspect(err)` output with `{ cause: undefined, … }` on every
no-cause instance. The `super(...)` call already conditionally sets
`.cause` non-enumerably when `options.cause` is provided.
- step-handler.test.ts: add a regression-gate suite that exercises the
fatal-vs-retryable retry-loop wiring directly. Asserts that an error
with `fatal: true` produces exactly one `step_failed` event with no
`step_retrying`, and that a non-fatal `Error` retries via
`step_retrying` on early attempts and emits `step_failed` once the
retry budget is exhausted. Catches the silent-regression case where
`fatal = true` is removed from a context-violation error class but
the `FatalError.is()` unit tests stay green.
VaguelySerious
left a comment
There was a problem hiding this comment.
Looks good enough for me to ship and test on 5.x.x
Can we drastically shorten the changesets? I think this should be a single changeset on core/error packages that explains the user-facing changes mostly, not a file-by-file breakdown of what's changed. Also let's remove PR artifacts. Then good to ship.
Address review feedback to drastically shorten the changesets — fold the 15 file-by-file entries into a single user-facing changeset for @workflow/core / errors / builders / utils. Also drop the pr-artifacts/ folder (reviewer-only log captures, no longer needed).
|
Done in 351971a:
Ready to ship. |
Five user-driven fixes from manual smoke-testing of #1849: 1. Logger layout. composeLogLine() now puts the structured-fields block (attribution badge, run/step IDs, error code) **between** the framing line and the stack body, instead of after it where 30+ lines of stack buried the most useful information. The framing stays at the top, stack at the bottom, structured info readable at a glance. 2. Stack trim. Drops framework-internal frames (`node_modules/.pnpm/`, `node:internal/`, Turbopack-bundled `node_modules__pnpm_*` chunks, `_next_dist_*` chunks) and caps the surviving frame count at 6 so the stack stays compact even on heavy async wrappers. Suppressed runs emit one summary line so users know the trim happened. 3. Wrapper-route noise. The nextjs-turbopack workbench's start route was catching `WorkflowRunFailedError` rejection on `Promise.race([readLoop(), run.returnValue])` and re-logging it via `console.error('Error in workflow stream:', error)` plus `controller.error(error)` — which then triggered Next.js's `⨯ failed to pipe response` overlay. The SDK already logs the failure cleanly upstream and the runId is on the response header, so the wrapper now closes the SSE stream cleanly on WorkflowRunFailedError. 4. Consistent framed `╰▶ hint:` / `╰▶ docs:` layout for all errors that carry a hint or docs slug. WorkflowError, SerializationError, and WorkflowBuildError now share one `appendFramedDetails` helper matching the box-drawing structure that ContextViolationError already used. Was: blank-line-separated `Learn more: <url>`. Now: one tree, indistinguishable from context-violation rendering. 5. Drop the duplicate logger-side `hint` field. Hints now live on the error message only — actionable hints get serialized into the event log, rehydrated on the workflow side, and shown in observability automatically. The previous logger-only hint duplicated stderr but never made it past the step boundary. Updated SerializationError hint to point at the foundations doc ("Ensure you're returning workflow serializable types. Check the serialization docs to see what's serializable: https://workflow-sdk.dev/docs/foundations/serialization") instead of the hardcoded `(plain objects, arrays, primitives, …)` list, which drifted out of sync as the supported types grew. Same hint reuses for step args, workflow args/return, stream messages, and any other site that goes through `formatSerializationError`. Also retitled the retry summary `3 retries` → `3 max retries` since "3 retries" next to "4 attempts" was ambiguous (already-happened vs. budget).
- ErrorStackBlock (web observability): show just the first non-empty trimmed line of the error message in the card title with single-line truncation. Multi-line messages (`Failed to serialize step return value\n╰▶ hint: …`) were rendering the entire framed body in the title, pushing the copy button off-screen and burying the scannability of the headline. Full message stays in the body via the stack (V8 prepends `Name: message` to `Error.stack`), so no information is lost; hover-tooltip exposes the full title text. - Persisted error message: drop the `Step "step//./.../foo"` machine name from `Step failed after N retries: …` and `Step exceeded max retries (…)` strings. Observability already attributes the event to a specific step via the UI tree, and the CLI logger emits the friendly `Step foo (./...) hit max retries` framing on its own line. Embedding the raw `step//./...` machine name in the persisted message text was duplicate noise.
Co-authored-by: Peter Wielander <mittgfu@gmail.com> Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>
Co-authored-by: Peter Wielander <mittgfu@gmail.com> Signed-off-by: Pranay Prakash <pranay.gp@gmail.com>
…lier-errors-followups * origin-https/main: [workbench] Add TanStack Start workbench and tests (#1875) Atomically dedupe duplicate step_created/wait_created events in world-local (#1877) Split tarball hosting out of docs into its own project (#1893) Replace fixed-sleep hook waits with event-driven waitForHook helper (#1879) # Conflicts: # pnpm-lock.yaml
…wups' into pranaygp/friendlier-errors-followups * origin/pranaygp/friendlier-errors-followups: Update .changeset/pretty-log-format.md Update .changeset/friendlier-errors.md
The class no longer attaches a slug-based `╰▶ docs:` line — the foundations URL is embedded directly in the hint via the `formatSerializationError` helper in @workflow/core. Update the test expectations accordingly: - bare-title case is now a single line (no docs link) - hint case renders one `╰▶ hint: …` branch (no second branch)
Four `should throw error for an unsupported type` cases were still asserting on the old hardcoded type list. Update to the new hint phrasing that points at the foundations doc, matching the change in `formatSerializationError` (`packages/core/src/serialization/errors.ts`).
| '`crypto.subtle.generateKey()` is not available inside a workflow function. Move key generation to a step function where full Node.js crypto is available.' | ||
| ); |
There was a problem hiding this comment.
Let's keep this as a not implemented error. generateKey is a web crypto thing I think, not a nodejs function
| const firstLine = | ||
| message.split('\n').find((line) => line.trim().length > 0) ?? message; | ||
| return firstLine.trim(); | ||
| } |
There was a problem hiding this comment.
even if it's just 1 line, we should have a max character length or something so that we don't render very long error messages in the title, and just leave the body to have the full message
TooTallNate
left a comment
There was a problem hiding this comment.
Withdrawing my prior REQUEST_CHANGES — both blockers are addressed and the related polish on top is welcome.
What's resolved
-
pr-artifacts/removed (351971ac08). The 6 files plus README are gone, and the changesets were consolidated into a singlefriendlier-errors.md+pretty-log-format.mdpair. ✓ -
Regression test for fatal-vs-retryable step routing added (
9d45cdf6ec,step-handler.test.ts:779-887). Three new tests cover exactly the gap I flagged:emits exactly one step_failed and does not re-queue when the step throws an error with fatal=true— uses a user-definedclass FatalUserError extends Error { readonly fatal = true }. This is precisely the case where a third-party error withfatal: trueshould be treated as fatal — explicitly tested end-to-end through the handler's retry loop.schedules a retry (and does not fail the step) on the first attempt of a non-fatal Error— positive case for the retry path.emits step_failed once the non-fatal retry budget is exhausted— terminal case aftermaxRetries.
All three pass on the PR branch when running the full file (820 core tests green).
-
util.inspectdedup fix incontext-violation-error.ts(lines 113-127) — the originalslice(1)only dropped the first line of.stack, so multi-line.messageended up duplicated whenutil.inspect()ran. Now slices pastmessageLineCountlines. There's a regression test incontext-errors.test.ts:139-156that asserts the framed╰▶ docs:line appears exactly once — solid catch from review. -
causeleak fix inWorkflowError(errors/src/index.ts:79-87) —this.cause = options?.causewas unconditionally setting the property, making it an enumerable own property even when undefined. The new conditional avoids pollutingutil.inspect(err)output with{ cause: undefined, ... }on every no-cause subclass. Subtle but correct.
Reframing on FatalError.is() widening
Withdrawing my prior concern about the changeset bump type. Confirmed: this is beta-v5-only and not backported to stable, so per AGENTS.md the patch bump is fine — bump type is cosmetic on main in pre-release mode and only matters for backport-to-stable, which doesn't apply here.
The semantic widening makes more sense in context: with the upcoming custom-class serialization work in #1851 routing thrown values through the serialization pipeline, any class with fatal: true opts cleanly into the no-retry path without inheritance gymnastics. That's a clean architecture story and explicitly tested by the new regression case. Approving the design.
Non-blocking observations remaining
These were all flagged previously as non-blocking and have not been addressed. Not gating but worth tracking:
- 6
stepLogger.*callsites still bypass the scopedstepRuntimeLogger(lines 374, 594, 651, 717, 774, 788 in current step-handler.ts). ThestepLoggernamespace is intentional fordebug:workflow:step:*filtering, so the choice is defensible — but it means the IDs the newforRunfactory was designed to centralize get manually re-passed in those 6 places.stepLogger.forRun(...).child({ stepId, stepName })would close this. - Lowercase
[workflow]prefix still inruntime/world.ts:90and a few Next.js builder files. These are directconsole.*calls (not logger callsites), so technically out of scope for the prefix-standardization work, but worth aligning eventually. - ANSI bytes can leak into stderr drains through
formatLogMetadata'sAnsi.*calls. The team has explicitly chosen this trade-off (comment inlog-format.ts:24notes "web/web-shared do not consume stderr at all — they read CBOR/JSON event payloads from the World event log"). Acceptable. - Test isolation nit in the new describe block.
step-handler fatal vs retryable behaviorhas nobeforeAllto initializecapturedHandlerRef; it relies on the siblingstep-handler 409 handlingblock'sbeforeAllrunning first. Means the new tests fail when run in isolation (vitest -t "fatal vs retryable"). Easy to fix by duplicating thebeforeAll, but only relevant to anyone reordering or pruning the file later.
Wrap-up
Big PR, well-executed end-to-end. The structured FramedContent/renderPlain/renderPretty split, the chalk subpath, the consolidated changeset, the regression tests added in response to review — all good craftsmanship. Ship.
Summary
Consolidates the eight-PR friendlier-errors stack into a single PR. Inspired by @Schniz's stalled #706. Superseded by this PR: #1831, #1832, #1836, #1837, #1838, #1839, #1840.
What's included
@workflow/errorsSerializationError,WorkflowBuildErrorclasses (with optionalhintfield).Ansirendering helpers (frame,code,docs,dim,inline) — now lives under the@workflow/errors/ansisubpath so the main entry doesn't pullchalkinto every consumer.FatalError.is(err)widened to recognize any error with afatal: trueown property.@workflow/core— context violationsNotInWorkflowContextError,NotInStepContextError,NotInWorkflowOrStepContextError,UnavailableInWorkflowContextError) applied to twelve user-facing throw sites. Each includes a docs link..message/.stackare plain text — the colored framed form renders lazily via[util.inspect.custom]/toString(), so structured logs and log drains no longer contain raw\x1B[...mbytes. All four classes setfatal = true, socreateHook()-from-a-step fails immediately instead of burning three retry attempts. Thrown errors redirect their stack to the user's call site via a sharedredirectStackToCallerhelper so terminal overlays (Next.js, Turbopack, VS Code) point at user code.@workflow/core— serializationSerializationErrorapplied to all user-facing serialization boundaries: stream locking, unregistered classes, missingWORKFLOW_DESERIALIZE, step-function / workflow-function misuse, and dehydrate/hydrate failures for workflow args, step args, and return values.@workflow/core— runtime logger.child()and.forRun(runId, workflowName)for stable per-run context, standardized[workflow-sdk]prefix, error stacks surfaced in log drains, clarified replay-timeout phrasing (warn while retrying vs. error when giving up).@workflow/core— attributiondescribeError(err)anddescribeRunError({ errorCode, errorName })compute user-vs-SDK attribution + class-aware hints, either from a liveErrorinstance or from persisted failure-event fields. Exposed under the public@workflow/core/describe-errorsubpath for CLI / web consumption. Terminal logs at step-failure, max-retries, run-failure, and fatal-setup sites now includeerrorAttributionmetadata and hint text.@workflow/core— consistencystartedAt, VMcrypto.subtle.generateKey, closure-vars outside a step context,ENOTSUP) now throwWorkflowRuntimeErrorso they are attributed to the SDK.defineHook().resume()formats schema validation failures as a readable bulleted list instead of a raw JSON dump.@workflow/builders— build-timeWorkflowBuildErrorwith ahintpointing at the likely fix.Fixes from manual testing (
createHook()inside a step)Surfaced three issues after running the earlier stack end-to-end and addressed in the final commits:
.messagebecause chalk resolves at construction time — fixed by storing plain text and rendering pretty lazily (seecontext-errors-plain-messagechangeset).fatal: trueand wideningFatalError.is()(seecontext-errors-fatalchangeset).captureStackTracefeature-detect in two files — fixed by extracting a sharedredirectStackToCallerhelper (seecapture-stack-sharedchangeset).Follow-up round (latest commits)
Manual testing uncovered two more issues, fixed in the last two commits on this branch:
SerializationErrorstill looped through 4 retries. Root cause:dehydrateStepReturnValue()was called outside the step-handler try/catch, soFatalError.is()never saw the error. Two-part fix: markSerializationErrorasfatal = true, and move the dehydration call inside the user-code try/catch instep-handler.tsso the error routes throughuserCodeFailed→step_failed(seeserialization-error-fatalchangeset). Same non-POJO return now fails in ~1.6s / 1 block, not ~21s / 4 blocks.toMatchInlineSnapshot-backed tests fordescribeError()payloads (every attribution path) and the scoped-logger call signature for the two canonical runtime failure sites. Regression gate on the exact field shapes users see in their log drains.Post-review polish (latest commits)
Driven by manual testing in tmux + reviewer feedback after the initial review pass:
user error · FatalError,run wrun_…,step step_… · add (./workflows/x)) between the framing line and the stack body, instead of after — the most useful info now sits at the top of each block instead of being buried under 30+ stack lines.composeLogLine()drops framework-internal frames (node_modules/.pnpm/,node:internal/, Turbopack-bundlednode_modules__pnpm_*and_next_dist_*chunks) and caps the surviving frames at 6. Suppressed runs emit one summary line so users know the stack was trimmed.step//./workflows/foo//barrenders asbar (./workflows/foo)everywhere we log a step or workflow name, via newformatStepName/formatWorkflowNamehelpers in@workflow/utils.╰▶ hint:/╰▶ docs:framing across all errors with hints or docs slugs.WorkflowError,SerializationError, andWorkflowBuildErrornow share oneappendFramedDetailshelper, matching the box-drawing treeContextViolationErroralready used. The blank-line-separatedLearn more: <url>style is gone.hintfield from runtime logger metadata. Hints get serialized into the event log on the message itself, rehydrate on the workflow side, and surface in observability. The previous logger-only hint duplicated stderr but never made it past the step boundary.(plain objects, arrays, primitives, …)list, which drifted out of sync as the supported types grew. Reuses across step args, workflow args/return, stream messages, and any other site that goes throughformatSerializationError.app/api/workflows/start/route.ts's SSE wrapper was catchingWorkflowRunFailedErrorrejection and re-emitting it viaconsole.error('Error in workflow stream:', error)+controller.error(error), which then triggered Next.js's⨯ failed to pipe responseoverlay. The SDK already logs the failure cleanly, so the wrapper now closes the stream gracefully onWorkflowRunFailedError.ErrorStackBlocktitle trim in web observability. Multi-line error messages (Failed to serialize step return value\n╰▶ hint: …) were rendering the entire framed body in the card title, pushing the copy button off-screen. Title now shows just the first non-empty trimmed line with single-line truncation; full message stays in the body viaError.stack.Step "step//./.../foo" failed after N retries: …andStep "step//.../foo" exceeded max retries (…)now readStep failed after N retries: …/Step exceeded max retries (…). Observability already attributes the event to a specific step via the UI tree.4 attempts · 3 max retriesinstead of3 retries(which was ambiguous next to "4 attempts").WorkflowErrorno longer leakscause: undefinedas an enumerable own property when no cause is provided — every subclass'sutil.inspect(err)output stays clean.ContextViolationError[util.inspect.custom]counted message lines correctly when slicing the stack tail, fixing duplicated╰▶ docs:lines on multi-detail errors.fatal: trueerror emits onestep_failedand zerostep_retrying; a non-fatalErrorretries viastep_retryinguntil the budget is exhausted, then emitsstep_failed. Catches the silent-regression case wherefatal = trueis removed from a class butFatalError.is()unit tests stay green.Addressed review feedback
errorspackage depending on chalk" —Ansiis now on a subpath; the main entry has no chalk dep path.Error.captureStackTraceguard/cast" — extracted topackages/core/src/capture-stack.ts.[util.inspect.custom]duplicates every detail line" — fixed by counting message lines instead of slicing only the first.cause: undefinedenumerable own property" — wrapped the assignment inif (options?.cause !== undefined).step_failed/step_retryingevent counts directly.friendlier-errors.mdcovering core/errors/builders/utils.Manual test plan
All sections below exercise different parts of the stack. Start
cd workbench/nextjs-turbopack && pnpm devunless otherwise noted.1. Context-violation errors (phase 1 + 2 + followups)
A convenient smoke route at
workbench/nextjs-turbopack/app/api/friendlier-errors-smoke/route.ts:createHook()outside workflow —?which=createHook. Terminal shows a framed box with title`createHook()` can only be called inside a workflow functionand adocs: https://…/workflow/create-hookline (polished — no longernote: Read more about…).sleep()outside workflow —?which=sleep. Same framing, docs URL ends in.../workflow/sleep.getStepMetadata()in a workflow function (not a step) — add to a"use workflow"file; expect title "can only be called inside a step function".getWorkflowMetadata()in application code —?which=getWorkflowMetadata. Expect "workflow or step function".resumeHook()inside a workflow — callresumeHook(token, payload)from inside a"use workflow"function. Title:`resumeHook()` cannot be called from a workflow context, plus a linethis call was made from the workflow//./src/workflows/example.ts//myWorkflow workflow context.with theworkflow/prefix dimmed.at ...line oferr.stackshould reference your route handler, not a frame inside@workflow/core. Same goes for the Next.js dev overlay.functionNameleak — the JSON response body should NOT contain afunctionNameproperty on the error object (it used to via the old constructor param-property)..message/.stack(new in this PR) — the JSON response body'smessageandstackstrings must contain no\x1B[bytes; they're plain text. In the terminal,console.error(err)still renders the pretty framed version viautil.inspect.createHook()inside a"use step"function:add(…). You should see one[workflow-sdk]fatal-error log block (Step "X" threw a FatalError — bubbling up...), not four. The step fails immediately.2. Runtime logger metadata (phase 3)
[workflow-sdk]. Grep your terminal output; there should be no[Workflows]or other prefixes.workflowRunIdandworkflowNameas metadata fields instead of being baked into the message string.warnlevel, phrasing includes "took too long — will retry".errorlevel, phrasing includes "gave up".errorStack), not just the one-line error message.3.
SerializationError(phase 4)Cause a user-facing serialization failure:
[workflow-sdk]log witherrorName: 'SerializationError'.errorAttribution: 'user'.╰▶ hint:line on the error message itself — the URL points athttps://workflow-sdk.dev/docs/foundations/serialization(foundations doc, not a hardcoded type list). The hint travels with the persisted event into observability.getWritable('x').getWriter()twice on the same stream — expect aSerializationErrorwith the same╰▶ hint:framing.4.
describeErrorattribution (phase 5)For each of these, inspect the
[workflow-sdk]log at failure time:Error—throw new Error('boom')from a step →errorAttribution: 'user', no framed hint on the message.SerializationError→errorAttribution: 'user'+╰▶ hint:line on the message pointing at the foundations doc.errorAttribution: 'user'+╰▶ docs:line on the message pointing at the per-API reference page.WorkflowRuntimeError—throw new WorkflowRuntimeError('invariant')from a step →errorAttribution: 'sdk'.WORKFLOW_REPLAY_TIMEOUT_MS=50, run a non-trivial workflow → after retries exhaust,errorAttribution: 'sdk',errorCode: 'REPLAY_TIMEOUT'.errorAttribution: 'sdk',errorCode: 'MAX_DELIVERIES_EXCEEDED'.5. Consistency pass (phase 6)
defineHook().resume()— callhook.resume(token, invalidBody). Expect a readable bulleted list of validation issues (one per line,at "field": message), not a raw JSON dump ofZodError.issues.crypto.subtle.generateKey()inside workflow VM — call it from a"use workflow"function. Expect a clear message explaining why it's disabled + "move this into a step function", witherrorAttribution: 'sdk'.6.
describeErrorsubpath (phase 7 foundation)Create
scratch.tsat repo root:Run
pnpm tsx scratch.ts.describeRunError({ errorCode: 'USER_ERROR', errorName: 'SerializationError' })→{ attribution: 'user', errorCode: 'USER_ERROR', hint: 'A value…serialized…' }.describeRunError({ errorCode: 'RUNTIME_ERROR' })→{ attribution: 'sdk', hint: 'This is an internal workflow SDK error…' }.describeError(new SerializationError('x'))anddescribeRunError({ errorCode: 'USER_ERROR', errorName: 'SerializationError' })return the same shape and hint string.@workflow/core/describe-errorandpnpm tsxruns without module-resolution errors.7.
WorkflowBuildError(phase 8)Exercise the build pipeline, not the runtime. Use
workbench/nextjs-turbopackand runpnpm build(notpnpm dev).WorkflowBuildErrortitled "Build failed during workflows bundle" followed by a blank line andhint: Review the esbuild errors above…. The original esbuild errors remain printed above (not suppressed).mv node_modules/workflow node_modules/workflow-bakand runpnpm build. ExpectWorkflowBuildError: Failed to resolve built-in steps sources.+hint: run \pnpm install workflow`…`. Restore afterwards.WorkflowBuildError: No output files generated from esbuild+ hint mentioning"use workflow"/"use step"directives..is()discriminator — in a scratch script:WorkflowBuildError.is(new WorkflowBuildError('x', { hint: 'y' }))returnstrue;WorkflowBuildError.is(new Error('x'))returnsfalse.pnpm dev). Confirm noWorkflowBuildErrorshows up; this class is build-time only.8. New
@workflow/errors/ansisubpath (final PR)import { Ansi } from '@workflow/errors'no longer works (and wasn't intended to — the helpers were always namespaced). Confirmimport * as Ansi from '@workflow/errors/ansi'resolves.import { SerializationError } from '@workflow/errors') no longer pullschalkinto its bundle. Check a production bundle orpnpm why chalkfrom a dependent context.9.
FatalError.is()widening (final PR)FatalError.is(new FatalError('x'))—true.FatalError.is(new NotInWorkflowContextError('createHook()', 'https://…'))—true(viafatal: trueown property).FatalError.is(new Error('x'))—false.FatalError.is({ fatal: true })—false(must be anError-shaped value).Unit tests
All packages typecheck clean; relevant test files pass:
pnpm --filter @workflow/errors test— 26 tests (Ansi, SerializationError, WorkflowBuildError, FatalError widening, framed-details consolidation)pnpm --filter @workflow/core exec vitest run src/log-format.test.ts src/logger.test.ts src/context-errors.test.ts src/describe-error.test.ts src/runtime/step-handler.test.ts— 76 tests, including newcomposeLogLinesnapshot tests and the fatal-vs-retryable step-handler integration suitepnpm --filter @workflow/builders test— 129 testspnpm typecheck— clean across workspaceSmoke-test harness (local)
For reviewers reproducing the manual test plan, a one-shot tmux harness lives at
/tmp/wf-1849-smoke.sh(built during testing). Pane layout:The harness's
workbench/nextjs-turbopack/workflows/smoke.tsandapp/api/friendlier-errors-smoke/route.tsare kept out of git (workbench-local) — set them up via theworkflow-init-style instructions in the prior conversation if you want to reuse the harness.🤖 Generated with Claude Code